Search CORE

116 research outputs found

An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric

Author: Alistair Moffat
Collins-Thompson Kevyn
Sakai Tetsuya
Voorhees Ellen M.
Yang Hui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/08/2018
Field of study

Many evaluation metrics have been defined to evaluate the effectiveness ad-hoc retrieval and search result diversification systems. However, it is often unclear which evaluation metric should be used to analyze the performance of retrieval systems given a specific task. Axiomatic analysis is an informative mechanism to understand the fundamentals of metrics and their suitability for particular scenarios. In this paper, we define a constraint-based axiomatic framework to study the suitability of existing metrics in search result diversification scenarios. The analysis informed the definition of Rank-Biased Utility (RBU) -- an adaptation of the well-known Rank-Biased Precision metric -- that takes into account redundancy and the user effort associated to the inspection of documents in the ranking. Our experiments over standard diversity evaluation campaigns show that the proposed metric captures quality criteria reflected by different metrics, being suitable in the absence of knowledge about particular features of the scenario under study.Comment: Original version: 10 pages. Preprint of full paper to appear at SIGIR'18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, July 8-12, 2018, Ann Arbor, MI, USA. ACM, New York, NY, US

arXiv.org e-Print Archive

Crossref

Controlling Risk of Web Question Answering

Author: Devlin Jacob
Dunn Matthew
Ferrucci David
Gal Yarin
Geifman Yonatan
Guo Chuan
Lai Guokun
Levy Omer
Malinin Andrey
Nguyen Tri
Richardson Matthew
Vinyals Oriol
Voorhees Ellen M.
Wang Shuohang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/07/2019
Field of study

Web question answering (QA) has become an indispensable component in modern search systems, which can significantly improve users' search experience by providing a direct answer to users' information need. This could be achieved by applying machine reading comprehension (MRC) models over the retrieved passages to extract answers with respect to the search query. With the development of deep learning techniques, state-of-the-art MRC performances have been achieved by recent deep methods. However, existing studies on MRC seldom address the predictive uncertainty issue, i.e., how likely the prediction of an MRC model is wrong, leading to uncontrollable risks in real-world Web QA applications. In this work, we first conduct an in-depth investigation over the risk of Web QA. We then introduce a novel risk control framework, which consists of a qualify model for uncertainty estimation using the probe idea, and a decision model for selectively output. For evaluation, we introduce risk-related metrics, rather than the traditional EM and F1 in MRC, for the evaluation of risk-aware Web QA. The empirical results over both the real-world Web QA dataset and the academic MRC benchmark collection demonstrate the effectiveness of our approach.Comment: 42nd International ACM SIGIR Conference on Research and Development in Information Retrieva

arXiv.org e-Print Archive

Crossref

Analyzing requirements and traceability information to improve bug localization

Author: Git
Marcus A.
Rath Michael
von Knethen A.
Voorhees Ellen M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2018
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Automatic Ground Truth Expansion for Timeline Evaluation

Author: Aslam Javed
Aslam Javed
Aslam Javed A
Dang Hoa Trang
Hoa Dang
Kedzie Chris
Lin Chin-Yew
Lin Jimmy
Nenkova Ani
Paul Over
Voorhees Ellen M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

The development of automatic systems that can produce timeline summaries by filtering high-volume streams of text documents, retaining only those that are relevant to a particular information need (e.g. topic or event), remains a very challenging task. To advance the field of automatic timeline generation, robust and reproducible evaluation methodologies are needed. To this end, several evaluation metrics and labeling methodologies have recently been developed - focusing on information nugget or cluster-based ground truth representations, respectively. These methodologies rely on human assessors manually mapping timeline items (e.g. tweets) to an explicit representation of what information a 'good' summary should contain. However, while these evaluation methodologies produce reusable ground truth labels, prior works have reported cases where such labels fail to accurately estimate the performance of new timeline generation systems due to label incompleteness. In this paper, we first quantify the extent to which timeline summary ground truth labels fail to generalize to new summarization systems, then we propose and evaluate new automatic solutions to this issue. In particular, using a depooling methodology over 21 systems and across three high-volume datasets, we quantify the degree of system ranking error caused by excluding those systems when labeling. We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low). However, we show that the risk of systems being miss-ranked increases as the effectiveness of systems held-out from the pool increases. To reduce the risk of miss-ranking systems, we also propose two different automatic ground truth label expansion techniques. Our results show that our proposed expansion techniques can be effective for increasing the robustness of the TREC-TS test collections, markedly reducing the number of miss-rankings by up to 50% on average among the scenarios tested

Crossref

Relevance Prediction from Eye-movements Using Semi-interpretable Convolutional Neural Networks

Author: Brouwer Anne-Marie
Dario
David
Fahey Daniel
Gwizdka Jacek
Hardoon DR
Hunter John D
Liu Yiqun
Salojärvi Jarkko
Salojärvi Jarkko
Saracevic Tefko
Voorhees Ellen M
Widdel Heino
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/01/2020
Field of study

We propose an image-classification method to predict the perceived-relevance of text documents from eye-movements. An eye-tracking study was conducted where participants read short news articles, and rated them as relevant or irrelevant for answering a trigger question. We encode participants' eye-movement scanpaths as images, and then train a convolutional neural network classifier using these scanpath images. The trained classifier is used to predict participants' perceived-relevance of news articles from the corresponding scanpath images. This method is content-independent, as the classifier does not require knowledge of the screen-content, or the user's information-task. Even with little data, the image classifier can predict perceived-relevance with up to 80% accuracy. When compared to similar eye-tracking studies from the literature, this scanpath image classification method outperforms previously reported metrics by appreciable margins. We also attempt to interpret how the image classifier differentiates between scanpaths on relevant and irrelevant documents

arXiv.org e-Print Archive

Crossref

Discrete deep learning for fast content-aware recommendation

Author: Chen Wenlin
He Ruining
Karatzoglou Alexandros
Koren Yehuda
Rendle Steffen
Salakhutdinov Ruslan
Vincent Pascal
Voorhees Ellen M
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Cold-start problem and recommendation efficiency have been regarded as two crucial challenges in the recommender system. In this paper, we propose a hashing based deep learning framework called Discrete Deep Learning (DDL), to map users and items to Hamming space, where a user's preference for an item can be efficiently calculated by Hamming distance, and this computation scheme significantly improves the efficiency of online recommendation. Besides, DDL unifies the user-item interaction information and the item content information to overcome the issues of data sparsity and cold-start. To be more specific, to integrate content information into our DDL framework, a deep learning model, Deep Belief Network (DBN), is applied to extract effective item representation from the item content information. Besides, the framework imposes balance and irrelevant constraints on binary codes to derive compact but informative binary codes. Due to the discrete constraints in DDL, we propose an efficient alternating optimization method consisting of iteratively solving a series of mixed-integer programming subproblems. Extensive experiments have been conducted to evaluate the performance of our DDL framework on two different Amazon datasets, and the experimental results demonstrate the superiority of DDL over the state-of-the-art methods regarding online recommendation efficiency and cold-start recommendation accuracy

Crossref

University of Queensland eSpace

Patent Retrieval in Chemistry based on semantically tagged Named Entities

Author: Buckland Lori P.
Fluck Juliane
Friedrich Christoph M.
Gurulingappa Harsha
Hofmann-Apitius Martin
Klinger Roman
Mevissen Heinz-Theo
Müller Bernd
Voorhees Ellen M.
Publication venue
Publication date: 01/01/2009
Field of study

Gurulingappa H, Müller B, Klinger R, et al. Patent Retrieval in Chemistry based on semantically tagged Named Entities. In: Voorhees EM, Buckland LP, eds. The Eighteenth Text RETrieval Conference (TREC 2009) Proceedings. Gaithersburg, Maryland, USA; 2009.This paper reports on the work that has been conducted by Fraunhofer SCAI for Trec Chemistry (Trec-Chem) track 2009. The team of Fraunhofer SCAI participated in two tasks, namely Technology Survey and Prior Art Search. The core of the framework is an index of 1.2 million chemical patents provided as a data set by Trec. For the technology survey, three runs were submitted based on semantic dictionaries and noun phrases. For the prior art search task, several elds were introduced into the index that contained normalized noun phrases, biomedical as well as chemical entities. Altogether, 36 runs were submitted for this task that were based on automatic querying with tokens, noun phrases and entities along with dierent search strategies

Fraunhofer-ePrints

Publications at Bielefeld University

A Phase II Trial of AZD6244 (Selumetinib, ARRY-142886), an Oral MEK1/2 Inhibitor, in Relapsed/Refractory Multiple Myeloma

Author: Annunziata Christina M.
Badros Ashraf Z.
Baz Rachid
Bose Prithviraj
Chen Jin-Qiu
Doyle Austin
Grant Steven
Herrmann Michelle
Hogan Kevin T.
Holkova Beata
Kmieciak Maciej
Korde Neha
Landgren Ola
Lin Hui-Yi
Raffeld Mark
Roberts John D.
Sankala Heidi
Shrader Ellen
Sullivan Daniel
Tombes Mary Beth
Voorhees Peter M.
Wan Wen
Weir-Wiggins Caryn
Wellons Martha
Xi Liqiang
Zhao Xiuhua
Zingone Adriana
Publication venue
Publication date: 01/01/2016
Field of study

AZD6244 is a MEK1/2 inhibitor with significant preclinical activity in multiple myeloma (MM) cells. This phase 2 study used a two-stage Simon design to determine the AZD6244 response rate in patients with relapsed or refractory MM

Carolina Digital Repository